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Abstract 

In this article we propose multiplication based random walk Metropolis Hastings 
(MH) algorithm on the real line. We call it the random dive MH (RDMH) algorithm. 
This algorithm, even if simple to apply, was not studied earlier in Markov chain Monte 
Carlo literature. One should not confuse RDMH with RWMH. It is shown that they 
are different, conceptually, mathematically and operationally. The kernel associated 
with theRDMH algorithm is shown to have standard properties like irreducibility, ape- 
riodicity and Harris recurrence under some mild assumptions. These ensure basic 
convergence (ergodicity) of the kernel. Further the kernel is shown to be geometric 
ergodic for a large class of target densities on IL This class even contains realistic 
target densities for which random walk or Langevin MH are not geometrically ergodic. 
Three simulation studies are given to demonstrate the mixing property and superiority 
of RDMH to standard MH algorithms on real line. A share-price return data is also 
analyzed and the results are compared with those available in the literature. 
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22 1 Introduction 

23 Suppose tt is a density on a state-space X with respect to some dominating measure A. Most 

24 often the state-space is a subset of the Euclidean space and A is the Lebesgue measure. In 

25 this article we will always assume A to be the Lebesgue measure. Statisticians' main aim 

26 is to study the characteristics of the density tt. Sometimes (say, in Bayesian inference) tt 

27 may be a complicated (possibly unnormalized) density which is not analytically tractable. 

28 So, to study the characteristics of tt, statisticians try to draw a sample from tt. But then 

29 also there may not exist any effective simulation procedure to simulate from tt. Thus the 

30 goal is shifted to draw an approximate sample from tt. Markov chain Monte Carlo (MCMC) 

31 provides a method for doing this. The MCMC methods, quite famous for their effectiveness 

32 in drawing an approximate sample from a target density are widely used. One of the most 

33 famous MCMC algorithms is the Metropolis-Hastings (MH) algorithm (Metropolis et ai, 

34 1953; Hastings, 1970). Given a current state x G X the MH algorithm proposes a new 

35 state y from a proposal kernel density q(x — > y) on X, and accepts it with the acceptance 

36 probability 

37 a [x — > y = mm < r, 1 > 1.1 

38 The corresponding MH kernel has stationary distribution tt(-). 

39 The random walk MH (RWMH) has the proposal kernel q(x — > y) = q(\x — y\). We 

40 notice that generation a point from such a density is same as generating an e from q(e) and 

41 then setting y — x + e. Other algorithms like Langevin MH (LMH), has proposal kernel 

42 N(x + (a 2 /2)Vlog Tr(x),a 2 ) 

43 All of the aforementioned algorithms have certain disadvantages. For example, both the 

44 RWMH and LMH have slow mixing rates in certain cases. The acceptance rate for RWMH 

45 is high if q(-) is concentrated around zero (i.e. small step size) but then a long chain is 

46 required to explore a substantial part of the state space. If a diffused proposal is used then 

47 the acceptance rate drops. For a multi-modal target (which is not known a priori in most 

48 cases) both the RWMH and the LMH chain may remain stuck at one or few of the modes and 

49 may still pass convergence diagnostics. Obviously any inference based on such samples will 

50 be incorrect. Also, an important property like geometric ergodicity which are sufficient for 

51 CLT type results of ergodic averages are either not satisfied by these algorithms (see Roberts, 

52 1999, for examples) or some strong assumption on tt(-) is needed. For example geometric 
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53 ergodicity to hold for RWMH it is necessary (and sufficient) that the target density is log- 

54 concave in the tail (Mengersen and Tweedie, f996). A density tt continuous and positive on 

55 K is log-concave in the tail if there exists an a > and some x\ > such that 

y > x > x\ ==>- log7r(x) — log7r(?/) > a(y — x) 

(1.2) 

y < x < —Xi =>- log7r(:r) — \og7i(y) > a(x — y). 

57 and similarly LMH is also geometrically ergodic under a strong assumption given in Theorem 

58 4.1 of Roberts and Tweedie (1996). These conditions are not satisfied for a large class of 

59 densities (e.g. the densities with thick-tails). 

eo Thus a MH method which allows the proposed state to be far away from the current state 

ei and yet has good acceptance rate will be of much use in the statistical computing problems. 

62 It is even better if the algorithm is geometrically ergodic for a class of densities much larger 

63 than the classes for which this property is enjoyed by the standard algorithms. In this article 

64 we propose a new MH algorithm based on multiplying a random quantity with the states. 

65 Even if the algorithm appears simple we found that it has excellent convergence and mixing 
ee properties. It can explore the state space quite faster than the standard MCMC algorithms 

67 and has geometric ergodicity property for a huge class of target densities, for which the 

68 standard algorithms fails to be geometric ergodic. The main reason for this is because the 

69 dives can be made large or small each with significant probabilities. If the random multiplier 

70 is close to one, then the proposed point will be close to the current state and conversely. In 

71 RWMH, however, this proposal cannot be controlled easily. If the step size is chosen large 

72 then most of the proposed points would be far away from the current states and if the step 

73 size is chosen small then most of the proposed points would be very close to the current 

74 states. 

75 There is obviously one issue with this algorithm - the origin is an absorbing state. How- 

76 ever, this is not vital since in major practical problems the variables are continuous and the 

77 origin has no mass. Thus we can safely remove it from the state space without disturbing 

78 the convergence. We emphasize that the RDMH algorithm exploits the multiplicative group 

79 structure of R — {0}. The algorithm fails when the origin has positive probability attached 
so to it (for example, when the state-space is the set of integers). The RWMH agorithm still 
si work in that case. In other problems such as Bayesian testing with point null hypothesis 
82 and two-sided alternative, where the target distribution has a continuous part and also has 
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83 a mass at zero, both RDMH and RWMH fails. 

84 MCMC techniques being extremely popular, the literature is rich with algorithms - 

85 specialized or generic in nature. Many of them are special cases of MH algorithms with 

86 different forms of proposal densities. We refer to Liu (2008) and Robert and Casella (2004) 

87 for book length discussions. 

as The structure of the article is as follows. We describe the new algorithm in section 2. As 

89 claimed already, the algorithm is new in the sense that it is completely different in concept 

90 and in structure from the available multiplicative random walk MH. This is discussed in 

91 details in section 3. We discuss its convergence properties in section 4. In section 5.1 we 

92 compare RDMH with the standard algorithms. Specifically we consider a bimodal target 

93 and see how RDMH explores the modes while RWMH cannot. We also consider an extreme 

94 mixture example where one of the components has very low dispersion compared to other. 

95 We then consider a thick tailed target for which RWMH is not geometric ergodic while 

96 RDMH is. In this example we see how asymptotic normality holds for the ergodic averages 

97 using RDMH while it fails to hold for the RWMH algorithm.In Section 5.3 we analyze a 

98 share price return data. Typically in such problems the posterior of one or more parameters 

99 are thick-tailed and asymmetric. The data and mode we consider are analyzed in Fernandez 

100 and Steel (1998) using Gibbs sampler. We found that the Gibbs sampler failed to explore 

101 the tail of the posterior of the location parameter while RDMH sampler did that with ease. 

102 We conclude this article with an outlook on further works in Section 6. 

103 2 Random dive MH 

104 Suppose that it is a target density function, probably unnormalized, on K. At each iteration 

105 the algorithm proposes a state y from the current state x by multiplying a random quantity 
we e with x (i.e. y = xe). The proposal is accepted with some probability depending on x and y. 
107 We can classify the proposals into two classes depending on whether |e| < 1 or > 1. We call 
los the case where |e| < 1 an inner dive and the case where |e| > 1 an outer dive. Notice that an 
109 outer dive can also be obtained by dividing the state x by an e with |e| < 1. Hence we can 
no restrict the set from which the random multiplier e is drawn to the set y = (—1, 1) — {0}. 
in Obviously the point zero is not considered so that the chain does not get stuck at zero. At 



Random dive MH 



5 



112 each iteration we can take an inner dive or an outer dive at random. We call the chain 

n3 symmetric if the probability for an inner dive is half and asymmetric otherwise. In this 

H4 article we shall only consider the symmetric RDMH only. So with a proposal density g{e) 

us for e on y, the algorithm is given in Algorithm 2.1. 

lie Algorithm 2.1. Random dive MH on IR 



H7 • Input: Initial value x ^ 0, and number of iterations N. 
us • For t = 0, . . . , N - 1 

n9 1. Generate e ~ g(-) and u ~ U(0, 1) independently 

120 2. If < u < 1/2, set 

/ \ MO, , 1 

121 x = x t e and a(x t , e) = min < — — -e, 1> 

U(^t) J 

122 3. Else set 

123 x — Xt e and a(x t , e) = mm < — — - — , 1 > 

U(^)l e l J 

124 4. Set 

{.x' with probability a(x t ,e) 
x t with probability 1 — a(x t ,e) 

126 • End for 

127 Notice that RDMH is an MH algorithm with 

q(x -+y) = (1/2) g(y/ x )^rl(\y\ < \x\) + (1/2) g(x/y)^I(\y\ > \x\) (2.1) 



129 Hence it follows that 7r(-) is indeed stationary for the chain. However other properties do 

130 not follow easily - the assumptions in general results discussed in Roberts and Rosenthal 

131 (2004) do not hold in this case. We prove them separately in the following section. Notice 

132 also that the terms |e| or its inverse in the acceptance ratios correspond to the Jacobian of 

133 the transformations : x h- > xe and 1 4 i/e respectively. The acceptance ratios are free from 

134 the proposal density g as in RWMH. 

135 For each x^Owe define the inner and outer acceptance regions respectively as, 

136 a(x) = {e G y : n(xe)\e\/-n(x) > 1} 

A{x) = {eey : 7r(x/e)/(7r(x)|e|) > 1} 
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138 Let r(x) = y — a(x) and R(x) = y — A(x) be the potential inner and outer rejection regions 

139 respectively. We see that for each i/O, the rejection probability is given by 

140 p{x) = J (1 — a(x — > y))q{x — > y) dy 

R* 

141 = \ I ^ ~ ^ ~* V ^~^\ 9 (x) dy+ \ S ^ ~ ~^ V ^y* 9 \y) dy ' 

IwKM lvl>M 

142 Substituting e = y/x in the first integral and e = x/y in the second we get, 

143 p{x) = - J (1 — a(x — > xt))g{t) dt + - ^ (1 — a{x — )■ x/t))g{e) de 

r(x) R(x) 

145 Obviously, in any MH algorithm the proposal density plays an important role in terms of 

we convergence. In this case also a good choice of g is needed for faster convergence. However 

147 as we shall see in Theorem 3 that the chain is geometric ergodic under an extremely weak 

148 restriction on g. Hence the discussion on the choices of g is postponed till the end of Section 

149 4. 

150 It is quite straightforward to extend the algorithm to higher dimensions. The variables 

151 may be updated either sequentially or jointly. While updating jointly at each iteration, outer 

152 dives should be applied to a random number of components (which may be zero) and inner 

153 dives to the rest. The Jacobian terms in the acceptance ratios will then be ratios of products 

154 of e's. The algorithm is given in Algorithm 6.1 of Section 6. 



155 3 Dissimilarity from RWMH 

156 It is already seen that the RDMH algorithm is a special case of MH class of algorithms. 

157 However, it is not in any case similar to the random walk type algorithms. The term 

158 multiplicative random walk (also known as the log-random walk MH) is not new in the 

159 statistics literature. However, it has been developed only when the state space is the positive 
wo half of the real line as X t +i = X t exp(N t ) where N t are i.i.d following some distributions on 
lei the real line (see, Dellaportas and Roberts, 2003, pp. 18 for details and Jasra et. ai, 
162 2005, for application). Obviously this reduces to the simple RWMH on IR by observing that 
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163 logX t+1 = \ogX t + N t . It is useless for problems having entire real line as support because 

164 a part of the state space is never visited (i.e the chain becomes reducible), that is, positive 

165 (negative) initial values restrict the chain to take only positive (respectively negative) values, 
we The RDMH, however, is developed when the state sapce is entire real line. The term 

167 exp(iV t ) above can not be equal to e t since e t can take both positive and negative values. 

168 Hence the RDMH algorithm cannot be considered as a special case of log-random walk MH. 

169 For distributions with (0, oo) as support the obvious way to emply RDMH is to reparametrize 

170 by taking logarithm so that the support becomes R. 



171 4 Convergence Properties 

172 Let us denote the kernel the of the RDMH chain by Kix — > y). Important properties like 

173 irreducibility and aperiodicity (see Roberts and Rosenthal, 2004, for definitions) are satisfied 

174 by the RDMH under minor assumption: 

175 Theorem 1. If for all < 5 < 1, inf < 5 < | £ | <1 g(e) > and7r(-) is bounded and positive on every 

176 compact subset o/R, then the chain is X-irreducible (and hence it -irreducible) and aperiodic. 

177 Proof. Suppose x ^ and A G S(R) has \{A) > 0. Then there exists a compact set 

178 C = {u : r < \u\ < R}, such that X(A n C) > where < r < \x\ < R < oo. Define, 

179 I x = [—\x\, \x\] A* = A n C m = inf ir(y) and M = sup7r(y) 

180 and a = inf r /R<\ e \<i g{t)- The kernel K of the chain satisfies, 

K(x,A) > [ q(y\x)mm{ * { y\ q{ y^ X \ ,l)dy 

A 

182 > / — — -giy/x) min \ ^ — , llcfa/ 

J 2\x\ V^\ x ) x ) 

A*m x 



A*f]IZ 



where 



a r mr 1 w . ra r mr l , . . . 

> — m in \ , 1 \ MA r\I x ) + — r mm <^ , 1 } X(A n I c ) 

~ 2R IMR J V xJ 2R 2 I MR J V xJ 

> c X(A n C) > 



f a ra ~] f mr ~\ 
c = min < — , — - > x min < , 1 > 
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we Thus we see that, the chain is X-irreducible. Further since each measurable set with positive 

187 Lebesgue measure can be accessed in a single step implies the chain is aperiodic. □ 

las Thus the RDMH chain is ergodic and hence by Theorem 4 of Roberts and Rosenthal 

189 (2004), 

\\K n (x,-) - tt(-)\\ tv -> asn^oo (4.1) 

191 for 7T— almost every i£]R. Where , — v<i\\tv is the well-known total variation distance 

192 between two probability measures v\ and u 2 , defined as 

193 Iki — v 2 \\tv = sup \v x {A) - u 2 (A)\ (4.2) 

A 

194 For further properties of the total variation distance see Meyn and Tweedie (1993); Roberts 

195 and Rosenthal (2004); Robert and Casella (2004) or Liu (2008). A sufficient condition for 
we (4.1) to hold for all x rather than 7r-almost every x is the Harris recurrence of the chain. A 

197 Markov chain (X n ) is called Harris recurrent if for every x in the state space and for every 

198 set B E B(R) such that ir(B) > 0, 

199 P(3n : X n e B\X = x] = l. 

200 Obviously the point zero creates a problem in our RDMH algorithm. However, if we remove 

201 the single point zero from the state space, then since the chain is already 7r-irreducible we 

202 use Lemma 7.3 of Robert and Casella (2004) to conclude that 

203 Corollary 1. Under the assumptions of Theorem 1, the RDMH chain is Harris recurrent 

204 on l* = R-{0}. 

205 Thus (4.1) holds for any nonzero x, i.e. any nonzero starting value ensures convergence 

206 of the chain. 

207 A subset C of X is called small if there exists a positive integer n, a number 5 > and 

208 a nontrivial measure v such that 

209 K n (x, A) > 5 u(A) Vx 6 C, VA e B(X) (4.3) 

210 We will characterize the small sets for RDMH. In most of the MH algorithms any bounded 
2n subset of the state space is small. However this is not the case with RDMH. We first state 
212 a result for RDMH kernel useful in characterizing the small sets. 
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213 Lemma 1. Suppose (x n ) is a sequence of positive (negative) numbers decreasing (resp. in- 

214 creasing) to zero, then K(x n , •) — > 5q, where 5q is the distribution degenerated at zero. 

Proof. Without loss we assume (x n ) | 0. Suppose y < 0. Then 

1 '° 



K(x n ,(-oo,y]) < - / g(e)de 0. 

1 Jx n /y 

Also for y > 0, for sufficiently large n, 

K(x n , (y, oo) < - j #(e)de ->• 0, 

215 so that K(x n , (— oo,y]) — > 1. This completes the proof. □ 

216 Theorem 2. Suppose the conditions in Theorem 1 holds. Then a set E is small if and only 
21? if its closure, E is a compact subset o/R* = R — {0}. 

Proof. Suppose first that E is compact subset of R*. Then 
1 

r := - x inf{|x| : x G E} > and R = 2 x sup{|x| : x G -E} < oo. 

218 So letting C = {u : r < |u| < R}, it is seen from the proof of theorem 1 that, 

219 K(x,A) > c X(AnC), VxeE,VAeB(R) 

220 Since \c{A) := X(A D C) is a nonzero measure on (R, 25(IR) ) , this shows that E is small. 

221 Now suppose that E is small. Clearly ±oo cannot be limit points of E since for any fixed 

222 n and bounded A, K n (x,A) — > as \x\ — > oo. We shall also show that zero cannot be a 

223 limit point of E. This will show that E is compact subset of R*. So suppose on the contrary 

224 that zero is a limit point of E. Then there exists a sequence (x n ) in E which monotonically 

225 converges to zero. Hence for any m G N and any measurable set A, by Lemma 1, 

226 K m (x n , A) = J K m ~\y, A)K(x n , dy) — >• 1(0 G A) as n ^ oo. 

w 

227 So that (4.3) cannot hold for all x G E and all A contradicting the assumption that E is 

228 small. □ 

229 We now turn towards geometric ergodicity. An irreducible Markov kernel K (irreducible 

230 with respect to some a— finite measure v) with invariant distribution tt(-) is said to be 

231 geometric ergodic if 

232 sup \K n (x,A) - tt(A)\ < M(x)p n , VnGN 

AeB(M) 
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233 for some p < 1, where M(x) < oo, for tt— a.e. x G EL 

234 Geometric ergodicity is important in MCMC applications for the CLT of ergodic averages 

1 N 

i=l 

236 of some function h evaluated at each state of the Markov chain (Xi). Corollary 2.1 of Roberts 

237 and Rosenthal (1997) (based on work of Kipnis and Varadhan, 1986) states that if a Markov 

238 kernel P is geometric ergodic and reversible then for any function h on the state space such 

239 that EttI^I 2 < oo 

y/N(h N - 7r(h)) N(0, <rl) as iV -> oo 

241 Such a CLT easily may not hold if the kernel is not geometric ergodic (see Roberts, 1999, 

242 for examples) or Section 5.2 of this article. Geometric ergodicity has multifarious usefulness 

243 discussed in Jones and Hobert (2001) and Roberts and Rosenthal (1998). 

244 To show that RDMH chain is geometrically ergodic, we put the following restriction on 

245 71. 

246 Assumption (Al). For some 1 < p < oo 

247 lim 7i(x)/n(xe) = \e\ p (4.4) 

\x\— >OD 

248 where the notation |e|°° should be interpreted as for each e G y. Notice that this is true 

249 for most of the posterior densities in Bayesian literature where MCMC finds extremely high 

250 applications. The condition (Al) given above is basically a regularly varying type restriction 

251 on 7i. Further discussion on regularly varying functions can be found in Feller (1971, pp. 

252 275-284). 

253 Recall that for each s^O, the rejection probability is 

r(x) R(x) 

255 We now give a bound on p(x) in the following lemma. 

256 Lemma 2. Assume (Al). Then 

257 p(x) — > - J (1 — lel^ 1 )^)^, as \x\ — > oo 

y 



p(x) ->• \ J i l ~ l e IM e )de, as x ->• 



y 
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259 Proof. Notice that as \x\ — > oo, r(x) — >■ 0, — > 3^ and as x — > 0, i?(a;) — )■ 0, r(a;) — > y. 

260 Hence the result follows from dominated convergence theorem. □ 

261 Now we state a helpful result without proof. 

262 Lemma 3. Fix p > 1. For each e E y and s E (0, 1) define 

263 f(s,e) = |e| s + |e| 1_s - N 

ip p (s,e) = \e\ ps + lel^ 3 - 1 - \e\P- 1 

265 With ^oo^e) = 0. T/ien 

266 (a) ip(s, e) < 1 for all e E y and s E (0, 1). 

267 (b) ip P {s,e) < 1 for alle Ey andO < s < 1/2 - l/(2p). 

268 As discussed in Theorem 15.0.1 of Meyn and Tweedie (1993), the RDMH chain is geo- 

269 metric ergodic if and only if for some small set E, some function V : R* — > [1, oo) which is 

270 finite at least for one x, 7 < 1 and some &e < 00, the geometric drift condition holds: 

KV(x) < >yV(x) + h E l{x E E), (4.5) 

272 where KV(x) = J R , K(x — > y)V(y)dy. In our case, we know any compact set of M* is small. 

273 So if we can show for some continuous V : M* — > [1, 00) which is bounded on every compact 

274 subset of M*, the following conditions hold: 

KV(x) KV(x) 

275 limsup— — — — - < 1 and limsup— — — — - < 1 (4.6) 

N ^oo V(x) x ^ V{x) 

276 then we can choose a number 7 < 1 and a small set E = {x : r < \x\ < R} for some 

277 < r < R < 00, such that KV(x) < jV(x) for all x ^ E. Also b E = sup xeE KV(x) < 00 

278 since V is bounded on E. Hence we see that (4.5) holds. 

279 We now state and prove the most important theorem of this section. 

280 Theorem 3. Suppose that conditions in Theorem 1 holds together with continuity of ix(x) 

281 and (Al). Further assume the following: for some s E (0, 1) 7 

1 

\e\- So g(e) de < 00 (4.7) 



-1 



283 Then the chain is geometrically ergodic. 
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284 Proof. In view of discussion preceding the statement of the theorem we only need to show 

285 (4.6). Fix < s < min{s , 1/2 - l/(2p)}. Then by (4.7) 

l 

\e\- s g(e) de < oo (4.8) 



287 Now Notice that, for each i^O, and any function V : R* — > [1, oo) 
KV{x) 1 f . ,V(xe) J 1 f . ,V(x/e) J 

2 ,/ yw 7r(x) 1 1 V^(ar) 2 J yK ' ir(x)\e\ V(x) HK J 

r(x) R{x) 

289 Choose positive constants ci, c?; and c" such that the function 

V(x) = c 1 7r(a;)~ s I(|s| > 1) + c* 2 x~%0 < x < 1) + c;*(-a;)" s I(-l < x < 0) 

291 is continuous 1 and V(x) > 1 for all x 7^ and let C2 = min-fcg, Cg*} and C2 = max{c2, c™}- 

292 We now work with this V to show (4.6) 

293 Case I: Suppose \x\ — > 00. Then assumption (Al) implies that A(x) — > <fi and r(x) — > (p. 

294 Notice in this case 

296 Hence by (4.8) the second integral in (4.9) converges to zero. Also, since V e G r(x) 

7r(xe). ,V(xe) ^ J /vr(a;e)\ 1-5 7r(xe) . , ,,. ,,Ci 
■ 1 < max < \ / e , /,/ e x e " 



tt(x) V(x) I \ k(x) J ' 7r(x) 1_ ' s C2 

< max j |e| s ,M s - 
l c 2 

298 where M = sup7r(x), the third integral in (4.9) also converges to zero. Further since 

299 V(xe) /V{x) — > \e\ ps it follows from (4.9) and Lemma 2 that 

1 1 

limsup ^y^ < (1/2) [ \e\ ps g(e)de + (1/2) [ W^- 1 g(e)de 

l^l^oo V(X) J J 



1 

p— 1\ 



+ (1/2) (1 - \er')g(e)de 



-1 



1 

If 1 

2 / ii P (s,e)g(e)de+ - 



1 



< 1 by Lemma 3 



1 The following choices do the job: sup^i^ ir(x) s < c\ < 00, = citt(1) s and c^* = Qn(—1)~ 
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304 Case II: Suppose now that x — Y 0. In this case a{x) — > <p an d — > 0. Also 

305 y(xe)/y(a;) < ^-|e|~ s for all |x| < 1 implies that the first integral in (4.9) converges to 

306 zero by (4.8). Also V |x| < 1 and V e G R(x) 



7r(x)|e| V(x) \ ' 7r(x) C\ J [ ' Ci 

308 where m = inf 7r(x) > 0, so that the fourth integral in (4.9) converges to zero. Hence by 

309 continuity of tt(-) at zero, 

i i i 

Hmsup^^ < l - J \e\ s g{e)de + ) i J \e\ 1 - s g(e)de+ l - J {l-\e\)g{e)de 

-l -l -l 

311 = \ j ^^^ g ^ de+ \ 

-1 

312 < 1 by Lemma 3 

313 This completes the proof. □ 

314 Remark: It is conjectured in Atchade and Perron (2007) that an MH chain is ge- 

315 ometric ergodic if the rejection probability is bounded away from 1. They have proved 

316 the result with an additional assumption that the continuous part of the MH kernel, i.e. 

317 a(x — > y)q{x — > y)dy which is an operator on L 2 (tt), is compact. Unfortunately this extra 

318 assumption does not hold for RDMH (along with most of the MH algorithms). Had the 

319 conjecture been proved, we could have claimed readily that RDMH is geometrically ergodic 

320 by using Lemma 2. This would not require the extra assumption (4.7) on g. 

321 The class of densities satisfying (Al) together with the assumptions of Theorem 1 and 

322 continuity is quite large. This class obviously includes the following classes: 

323 1. The class of thick-tailed densities n(x) ~ — — — — as \x\ — ?■ oo where p(x) is a polynomial 

p{x) 

324 satisfying p(x) > for all sufficiently large \x\. For example, the t-densities fall in this 

325 class. 

326 2. The class of densities which are equally log-concave in the two tails, i.e., for some 

327 M > and some a > 0, 

328 \y\ > \x\ > M =^ log7r(x) - log7r(y) > a(\y\ - \x\) (4.10) 
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329 This is a stronger version of (1.2) for (4.10) implies (1.2). Notice that for these densities, 

330 p = oo in (Al). Examples of such densities are the normal densities and their mixtures, 

331 double exponential density etc. 

332 3. The class of densities of the form h(x) exp(— k ^\x — 8\) where m > 1. 

333 In some problems, however, the target tc(x) is log-concave in the tails but the rates at 



334 which 7i (x) converges to zero are not same for the two tails. One example of such densities 

335 is f(x) = exp(x — exp(x)). Notice that if Y follows the standard exponential distribution 

336 £xp(l) then X = logY has density f(x). It can be seen that f(x)/f(xe) — > holds for each 

337 e > 0, as |x| — > oo and also for each e < and as x — > oo. But f(x)/f(xe) — > oo if e < and 

338 x — > — oo. We assume for these kind of densities exactly one tail dominates, i.e. exactly one 

339 the following is true for each e G (—1, 0). 

340 lim 7i(x)/7i(xe) — > oo and lim 7r(x)/7T(xe) — > (4-H) 

X— > — oo x— »oo 

341 lim 7t(x)/tt(x€) — > and lim 7r(x)//T(xe) — > oo (4-12) 

x— >— oo x— >oo 

342 Notice that it is sufficient to work with (4.11) because if (4.12) holds for a target ir, then 

343 ?t(— x) satisfies (4.11). For these kind of densities (Al) does not hold. However, the next 

344 theorem assures that the RDMH chain is still geometric ergodic. The proof is along the line 

345 of Theorem 3 and so we just present a sketch. 

346 Theorem 4. Suppose tt is continuous and the assumptions in Theorem 1 holds. Suppose 

347 further that 7r(x) satisfies (4.11) and that g satisfies the regularity condition (4.7). Then the 

348 RDMH chain is geometric ergodic. 

349 Proof. Notice that in this case we have the following as x — > oo we still have a(x) — > y and 

350 r(x) — > <p. But 

351 A(x) n (0, 1) (j) A{x) n (-1,0) (-1,0) 

352 and 

R(x) n (o, l) (o, l) r(x) n (-1, o) (p. 

354 Thus in 4.9 (with the same choice of V as in theorem 3) we can further split the integrals 

355 on intersections of the domains with (—1,0) and (0,1). On each such domain either the 
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356 integrand converges to zero or it is bounded and the domain of the integral converges to the 

357 empty set. Hence 

KV(x) 

358 limsup— — — — - < limsupp(x) < 1/2 

X— >oo V \X) x—*oo 



w - / (i-^)*>*+ / ('-^)*>* 

r(cc)n(0,l) r(x)n(-l,0) 

H(cc)n(o,l) R(x)n(-i,o) 
l 



J g(e)de < 1 as x — > oo 



363 Also as a; — oo, it can be seen that — )■ and i?(:r) — > y hold but 

364 a(x) n (0, 1) -»> (0, 1), a(rr) D (-1, 0) -»> 



and 



367 Hence similarly, 



369 since in this case 



r(x) n (0, 1) <f> r(x) n (-1,0) (-1,0) 



limsup < limsupp(x) < 1 

X— »oo » ^ 2^ J x— >oo 



r(x)n(0,l) r(x)n(-l,0) 

fi(a)n(0,l) fl(a:)n(-l ) 0) 
1 



— >■ 2 J g(e)de + J g(e)de < 2 as x — > — oo 



-1 o 

373 This verifies the first condition of (4.6). Verification of the second condition of (4.6) is 

374 already done in case II of Theorem 3. □ 

375 We now return to the choices of g. Let B(x; a, b) denote the density of a Beta(a, b) random 

376 variable. A general class of proposal densities satisfying (4.7) is then given by 

g{e) = 7#(-q oi, 6i)I(-l < e < 0) + (1 - 7 )B(e; a 2 , 6 2 )I(0 < e < 1). 
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378 for some < 7 < 1 and some positive numbers ai,a 2 ,bi,b 2 . Notice that a straightforward 

379 choice of g is uniform distribution over (-1,1) which corresponds to the case 7 = 1/2 and 

380 di — a 2 — b\ — b 2 = 1- This indeed allows for large dives but the acceptance rate may 

381 drop. Further if the simulated e is very close to zero and a outer dive is taken, then the 

382 proposed state will have large magnitude and result in numerical instability. Specially when 

383 the posterior is highly steep then such large dives are not sensible. We shall nevertheless use 

384 this choice of g in the next section and show it works well. If however, the target density is 

385 steep then a good idea would be to generate e's close to 1. This can be achived by 

386 1. making 7 small (e.g. 7 < 0.2) and 

387 2. making a 2 high and b 2 small (e.g. a 2 > 2 and < b 2 < 1). 

388 5 Application 

389 In this section we consider two simulation studies. We first consider a bimodal target density 

390 and show how the RDMH algorithm explores the modes but the RWMH chain either gets 

391 stuck at the mode (if the proposal variance is moderate) or explores the modes at high value 

392 of proposal variance but has very low acceptance rate. In the next example we consider 

393 another simulation study on a thick tailed target. The RWMH and the LMH algorithms 

394 are not geometrically ergodic for this target under any kind of proposal (thick-tailed or thin 

395 tailed) and this has a serious effect when we try to construct a confidence set based on 

396 asymptotic normality of ergodic averages - for the latter does not hold in this case. However 

397 the RDMH is still geometric ergodic and a CLT holds for the ergodic averages. 

398 5.1 Exploring a multimodal target 

399 Example 1. Consider the mixture distribution 

400 ir{x) = 0.5 (f)(x; 0, 0.25) + 0.5 <f>(x; 10, 0.25) 

401 where tp(x; fx, a) = exp (— 0.5(x — fx) 2 /a 2 )) / (ay/2ir) is the normal density with mean fx and 

402 variance a 2 . 

403 Clearly this is a bimodal distribution with two separated modes at x = and x = 10. 

404 We compare the RDMH with the RWMH here. We choose g, as the uniform distribution on 




Figure 5.1: Histograms and True density(solid curve) based on 30,000 sample values out of 50,000 
sample values (burn-in = 20,000) for the bimodal example in section 5.1 

(a) RWMH: r = 2, initial value: x = acceptance rate = 29.916%, (b) RWMH: r = 2, initial 
value: xo = 10 acceptance rate = 29.656%, (c) RWMH: r = 5, initial value: xq = 10 acceptance 
rate = 14.31%, (d) RDMH, initial value: xo = —2, acceptance rate = 30.172% 

405 y= (-1,1) -{0} i.e. 

406 g (e) = 1/2 - 1 < e < 1, e ^ 

407 For the RWMH, we choose the proposal q{x'\x) = 4>(x';x,t 2 ) for different choices of 

408 t. Figure 5.1 (panels (a) and (b)) shows that RWMH remains stuck at one of the modes 

409 for an arbitrary but reasonable choice of r, and with arbitrary initial value. This indicates 

410 significant non-robustness of RWMH with respect to the initial value and the choice of r even 

411 if it is geometric ergodic in this case. Only when r has been appropriately chosen, RWMH 

412 performs adequately (Figure 5.1, panel (c)). We remark that such "right" choice is possible 

413 only if bimodality of the target posterior is anticipated beforehand, which is unrealistic. Even 
4u for the appropriate choice of r we notice that the acceptance rate of RWMH is rather small 

415 (14.31%). In contrast RDMH adequately explored the entire state space without requiring 

416 knowledge of the target density (Figure 5.1, panel (d)), or tuning of the proposal. The 

417 acceptance rate, which is 30.172%, much encouraging compared to the RWMH algorithm. 
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418 Example 2. Now we consider a more challenging case similar to one considered by Chen 

419 and Kim (2006, p. 1632-1634) in their Example 2. The target is a mixture of univariate 

420 normals: 

?r(x) = 0.5 (j)(x; 0, 10~ 4 ) + 0.5 5, 1) 

Several specialized algorithms are available for such needle-in-haystack problem among which 
the equi-energy sampler by Kou et. al. (2006) is worth mentioning. The equi-energy sampler 
is extremely efficient once the tuning parameters are chosen carefully. For our purpose we 
chose few proposals at random and study their performances. In particular, we run each 
chain (of length 30,000 each after discarding first 20,000 burn-ins) 100 times and estimate 

1 30000 

P = V l(\X t \ < 0.05) 

30000 Vl ' ; 

i=i 

422 , where X t denotes the Markov chain. From Table 5.1 it can be seen that all the proposals 

423 work quite well. The third and fifth proposal results small M.S.E's perhaps due to the fact 

424 that they put more weight near zero (the multiplier is close to zero and hence so is the 

425 proposed state) and one of the mode is at zero. However, since in practice, it need not be 

426 the case it might result in poor acceptance rates. So, a proposal that generates random 

427 multiplier close to 1 should be preferred. 



Proposal 


E(P) s.d.(P) MSE(P) Avg. accep. rates 


U(-l,l) 

0.15#(-e; 1, 1) + 0.85£(e; 1, 1) 
0.15B(-e; 0.5, 1) + 0.85£(e; 0.5, 1) 
0.15B(-e; 1, 0.5) + 0.85£(e; 1, 0.5) 
0.5S(-e; 0.5, 0.5) + 0.5£(e; 0.5, 0.5) 


0.5115 0.1361 0.0184 37.14% 
0.4953 0.1218 0.0147 41.08% 
0.4938 0.0470 0.0022 28.51% 
0.4912 0.1767 0.0310 56.79% 
0.5024 0.0520 0.0027 37.91% 



Table 5.1: Mean, standard deviation and mean squared errors of P for different proposals. 
The average acceptance rates are also reported. 
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5.2 Exploring a thick-tailed target 



429 In this section we consider a thick-tailed target density 7r for which RWMH and LMH are 

430 not geometrically ergodic but RDMH is. We chose 



ir(x) 



1 



7T 1 



X 



2\2 



(5.1) 



432 It is easy to verify that E^|X| 2 < oo. Any other thick tailed density or a density which is 

433 not log-concave in the tail could have been chosen in place of (5.1). 

434 Notice that Vlog7r(x) — > as \x\ — > oo. So n cannot be log-concave in the tail and 

435 hence the RWMH chain is not geometrically ergodic. Moreover, Theorem 4.3 of Roberts and 
Tweedie (1996) assures that the LMH is not geometrically ergodic either. 

Table 5.2: P-values of tests for normality performed on the samples of means. 





Acceptance 






P-value of 




Algorithm 


rate (%) 


s.e. 


AD 


CVM 


Lill 


RDMH 


66.43 


0.0074 


0.8242 


0.8241 


0.5737 


RWMH (JV(0, 1.5 2 )) 


46.99 


0.0298 








RWMH (C(0,1)) 


44.95 


0.0182 








LMH (scale = 2) 


87.90 


0.1767 









LMH (scale = 3) 


79.39 


0.5460 








LMH (scale = 4) 


78.17 


0.8631 









Our parameter of interest was E n X. We compared the RDMH algorithm with the RWMH 
and the LMH algorithms. For RDMH proposal g we again chose the uniform distribution 
over (—1, 1). For RWMH, however we chose two proposals - one thin-tailed and one thick- 
tailed. The the thin-tailed proposal is a normal distribution with mean zero and variance 
1.15 2 (A/"(0, 1.5 2 ) while thick-tailed proposal is the standard Cauchy distribution (C(0,1)). 
The scale paramters for the LMH were chosen to be 2, 3 and 4. For each of the algorithms, 
we ran 1000 independent chains of lengths 50,000 each and obtained the means of last 
40,000 values of each such chain. Thus we obtained six samples of estimates of E n X each 
of which had size 1000. Three tests were performed on each of these three samples in order 
to quantitatively assess the normality behavior. The three tests were the Anderson-Darling 
(AD) test, Cramer-von Mises (CVM) test and the Lilliefors (Lill) test for normality. The 
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Table 5.3: Performances of the algorithms in terms of Kolmogorov-Smirnov distances. 



Method: 


RDMH 


RWMH 


LMH 


A/"(0,1.5 2 ) 


C(0,1) 


Scale = 2 


Scale = 3 


Scale = 4 


KS value 
p- value 


0.0202 
0.7997 


0.0788 
0.0338 


0.0647 
0.0365 


0.3682 



0.4665 



0.5000 




448 descriptions of the tests can be found in Thode (2002). The tests were performed by nortest 

449 package of R-statistical software. The p-values together with the average acceptance rate of 

450 each of the 1000 chains and standard error of each of the six samples of empirical means are 

451 reported in Table 5.2. The QQ-plots are shown in Figure 5.2 and the auto-correlation plots 

452 for a typical run of the samplers are shown in Figure 5.3 (no thinning). 

453 It is seen both from the p-values and the QQ-plots that normality holds for the empirical 

454 means obtained by RDMH algorithm while in the RWMH and the LMH algorithms they are 

455 far from normality. 

456 Next, to judge how fast RDMH converges in this scenario we conducted a further study. 

457 For each of the algorithms we ran thousand independent chains of length 1000 each and 

458 calculated d(F, F ) - the Kolmogorov-Smirnov distance between the empirical c.d.f (F) and 

459 the true c.d.f. 

1 11.. 

460 Fn(x) = — arctanx H 1 sin(2 arctanx) 

461 by the formula 

d(F,F )=suv\F(x)-F {x)\ 

xgR 

463 and also obtained the p-values for testing H : F = F against the two sided alternative. 

464 Table 5.3 reports the averages of the Kolmogorov-Smirnov distances and the p-values from 

465 the 1000 independent chains. This shows that the RDMH chain converges much faster 

466 compared to the RWMH and the LMH algorithms. 

467 5.3 Share price return data 

468 In this section we consider the daily price returns of Abbey National share between July 31 

469 and October 8, 1991. The data is presented in Table 1 of Buckle (1995). We consider the 

470 simple location-scale model proposed and analyzed in Fernandez and Steel (1998). Let p^, i = 
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(2) 



(3) 




Theoretical Quantiles 






Theoretical Quantiles 



Theoretical Quantile 



(5) 



(6) 





Figure 5.2: QQ-plots of six samples of empirical means. (1) : RDMH; (2) : RWMH with 
JV(0, 1.5 2 ) proposal; (3) RWMH with C(0, 1) proposal; (4) - (6) : LMH with scales 2, 3 and 
4 respectively 




(2) 



31 




n 



Figure 5.3: Auto-correlation plots of six samplers. (1) : RDMH; (2) : RWMH with Af(0, 1.5 2 ) 
proposal; (3) RWMH with C(0, 1) proposal; (4) - (6) : LMH with scales 2, 3 and 4 respectively 
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471 0, ... ,49 denote the price data in Table 1 of Buckle (1995) and jji = (pi — Pi~i)/pi-i, i 

472 1, . . . ,49. Fernandez and Steel (1998) modeled the data as follows: 



p(yi,...,y n \P,<r, v,i) 



7 
n r 



x r((i/ + i)/2) 

(yi - /5) 2 r i 



a 



X 



n 

i=i 



i + 



l(y t > f3)+ 1 %y t <f3) 



>+l)/2 



(5.2) 



474 with independent priors on the parameters as follows: 

475 p(/3) = 1 ; p(a) = l/a 

p(z/) = dexp(-diz) ; p(<f>) = b a Y (a)' 1 <p a ~ l exp(-60) 

477 where = r y 2 . The hyper-parameters are given by Fernandez and Steel (1998) as d — 

478 0.1, a = 1/2 and b = We log-transform all the parameters except j3 so that the state 

479 space becomes R 4 . That is, we re-parametrize: a = logo", v = logu and 7 = log 7. We 

480 updated the parameters (/3, a, P, 7) sequentially with the following proposal densities: 



08 (e) 
99(e) 

9^) 



0.80S(e; 2, 1)1(0 < e < 1) + 0.20i3(-e; 2, 1)I(— 1 < e < 0) 
0.80S(e; 3, 0.5)1(0 < e < 1) + 0.20B(-e;3, 0.5)I(-1< e < 0) 
0.80S(e; 3, 0.5)1(0 < e < 1) + 0.20B(-e;3, 0.5)I(-1 < e < 0) 
0.80S(e; 2, 0.5)1(0 < e < 1) + 0.20£(-e; 2, 0.5)I(-1 < e < 0) 

\6-1t 



where B(e;a,b) is the Beta density proportional to e (1 — e) 1(0 < e < 1). We tried 
couple of other such mixtures too and the results were very close. Using a proposal density 
uniform on (—1, 1) is not a good idea in this discussed before. 

Fernandez and Steel (1998) used a Gibbs sampler approach with data-augmentation. 
They faced some numerical difficulties and perturbed the yi's slightly to resolve the numerical 
problems. The RDMH sampler, however, did not face any numerical problem. The results 
obtained by RDMH differs from the Gibbs sampler perhaps due to this reason. Actually, the 
posteriors of a, v and 7 were same whether we used Gibbs sampler or not (see the paper by 
(Fernandez and Steel, 1998) for the Gibbs sampler output). The posterior of /3 were quite 
dissimilar for the RDMH and Gibbs samplers. For the Gibbs sampler the posterior was 
mainly concentrated between —0.016 and 0.002 while it was concentrated between —0.005 
and 0.015 for the RDMH chain. Clearly, the Gibbs sampler fails to cover the long tail of the 
posterior of (3 while the RDMH explores it quite easily. 
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Figure 5.4: Density histograms of the posteriors in share price return data example. 

Traceplot of Bela Traceplot of log(Sicma) 




—r- 

200 



—\ 1 r 

600 800 1000 



6 
S 1 



Index 




Traceplot of log(nu) 



Traceplot of log(gamma) 




1 

200 



-r 



-r 



400 600 
Index 



BOO 1000 




— I — 

200 



— r 

400 



-T 



600 
Index 



aoo 1000 



Figure 5.5: Traceplots of the the last thousand iterations of the RDMH chain for the share 
price return data example. 
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Figure 5.6: Auto-correlation function of the RDMH sampler. 



Parameter 


RDMH 


Gibbs 




mean s.d. 


mean s.d. 





0.0066 0.0029 


-0.0068 0.0028 


a 


0.0091 0.0018 


0.0091 0.0018 



Table 5.4: Posterior summaries for f3 and a for the RDMH and Gibbs chains. The result for 
the Gibbs chain are taken from Fernandez and Steel (1998). 



Parameter 


mean(RDMH) 


s.d. (RDMH) 


V 


8.0119 


7.0520 


7 


0.6745 


0.1408 



Table 5.5: Posterior summaries for v and 7 for the RDMH chain. 
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498 To ensure we also ran a random walk MH sampler and found that the results for the 

499 RWMH sampler coincided with that of the RDMH sampler. The summaries of the RDMH 

500 sampler is given in Table 5.4 and 5.5 and the histograms and traceplots of the same are 

501 given in Figure 5.4 and 5.5 respectively. The autocorrelation plots of the RDMH chains are 

502 given in Figure 5.6. We ran the sampler for 160,000 iterations and discarded the first 10,000 

503 samples as burn-ins. We then thinned the remaining 150,000 samples by 5. Convergence 

504 was achieved much earlier though. We also found that the mixing for the RDMH sampler 

505 was superior to that of the RWMH sampler. 



506 



507 



508 



6 Further works 

We conclude this article with some purview of possible extension to higher dimension. Sup- 
pose 7r is a density supported on IR fc and g is density on y k . Then the algorithm is given in 



509 Algorithm 6.1. 



Algorithm 6.1. Random dive MH on 



512 



Input: Initial value x( ) with no component equal to 0, and number of iterations N. 

For t = 0, . . . , N - 1 

1. Generate e ~ g(-) and Ui ~ U(0, 1), i — 1, . . . , k independently 

2. For each i if < Ui < 1/2, set x\ = xf^ei. else set x\ = xf' /e» 

3. Let / = {1 < % < k : u { < 1/2} and set 



a x,e =muW ) 7 ^ iel , 1 \ 



4. Set 

x' with probability a(x^\e) 

with probability 1 — a(x^\e) 

End for 



x (m) 



520 
521 
522 



This algorithm is still irreducible and aperiodic. It is also Harris recurrent on W k and 
every compact subset of lR* fc is still small. The proof is along the same line as Theorem 1 
and 2. Geometric ergodicity is, however, a property that requires a different approach. It is 
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523 expected that geometric ergodicity of this algorithm still holds for a large class of densities 

524 (especially the thick-tailed ones) on higher dimensions. We hope that this article would 

525 draw attention of the researchers and the question regarding geometric ergodicity in higher 

526 dimension situation would be settled. 

527 The proposal density g(e) on y k can be chosen to be the product of proposal densities 

528 on y. In such a case, one should choose the univariate proposals which generate e's close 

529 to 1 with high probabilities each (for example, the mixture proposals in Section 5.3). This 

530 will ensure that the proposed states are not too far away from the current state (in R fc ) to 

531 reduce the acceptance rate significantly. 
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